Method and system for using lip sequences to control operations of a device

ABSTRACT

A smart control system may determine whether a user is engaged with an electronic device, capture a lip sequence of the user for a period of time in response to determining that user is engaged with the electronic device, generate reduce sequence based on the captured lip sequence, determining an application with which the user is engaged, determine a current operating state of the application with which the user is engaged, determine commands that are applicable to the current operating state of the application, and determine whether the generated reduction sequence matches a comparison sequence associated with one of the determined commands. The smart control system may send a command associated with a matching comparison sequence to the application/electronic device in response to determining that the generated reduction sequence matches a comparison sequence associated with one of the determined commands.

BACKGROUND

Modern control systems, such universal remotes and smart speakers,include built-in microphones that allow users to interact with othersmart devices or internet services using their voice. For example, smartspeaker systems (e.g., Google Home®, Apple HomePod®, Amazon Echo®, etc.)typically include a speaker, a microphone and a processor configured tocontrol other smart devices (e.g., smart TVs, smart thermostats, etc.),search the Internet for information, play music, make phone calls, andperform other similar tasks. In addition, smart speakers often include avirtual assistant, which is a software service that receives the user'svoice as input, identifies a command or question, interacts with otherservices or devices based on the command, and generates an output orresponse.

The microphones in these modern control systems remain on, continuouslycapturing and evaluating voices to detect wake-up expressions (e.g., OKGoogle, Alexa, etc.) or user commands. This, and other features andcharacteristics of modern control systems, may present a number ofprivacy and data protection challenges for venders, devicemanufacturers, and users of smart speaker systems.

SUMMARY

The various aspects of the disclosure provide methods of controlling adevice, which may include determining by a processor in user equipmentdevice whether a user is engaged with an electronic device, capturing bythe processor a lip sequence of the user for a period of time inresponse to determining that the user is engaged with the electronicdevice, generating by the processor a reduction sequence based on thecaptured lip sequence, determining by the processor an application withwhich the user is engaged, determining by the processor a currentoperating state of the application, determining by the processorcommands that are applicable to the current operating state, determiningby the processor whether the generated reduction sequence matches acomparison sequence associated with one of the determined commands, andcontrolling by the processor in the user equipment device the device bysending a command associated with a matching comparison sequence to theapplication in response to determining that the generated reductionsequence matches the comparison sequence associated with one of thedetermined commands.

In some aspects, capturing the lip sequence of the user for the periodof time may include transmitting light detection and ranging (LIDAR)signals towards a face of the user, capture reflections of the LIDARsignals off points on the face of the user, using the capturedreflections to identify the points on lips of the user, and determininga polygon based on the identified points on the lips of the user. Insome aspects, using the captured reflections to identify the points onthe lips of the user may include using the captured reflections toidentify the points and angles, and determining the polygon based on theidentified points may include determining the polygon based on theidentified points and the identified angles.

Some aspects may include generating a captured sequence informationstructure based on the determined polygon, in which generating thereduce sequence based on the captured lip sequence may includegenerating the reduce sequence based on the generated captured sequenceinformation structure. In some aspects, determining the application withwhich the user is engaged may include selecting the applicationoperating in the foreground of the electronic device. In some aspects,selecting the application operating in the foreground of the electronicdevice may include selecting a video player application operating in theforeground of the electronic device. In some aspects, determiningwhether the user is engaged with the electronic device may includedetermining whether the user is looking towards an electronic display ofthe electronic device.

In some embodiments, a device may be controlled by a user equipmentdevice after determining that the user is engaged with an electronicdevice. For example, the device may be a DVD player or gaming systemthat outputs content to a television (i.e., electronic device) that theuser is engaged with by watching the television. The user equipmentdevice may be, for example, a set top box, an audio-video receiver, or asmart speaker. In such embodiments, the user equipment device maydetermine whether a user is engaged with the television (e.g.,electronic device). The set-top box (e.g., user equipment) may capture alip sequence of the user for a period of time in response to determiningthat the user is engaged with the television (e.g., electronic device).The processor of the set-top box (e.g., user equipment) may generate areduction sequence based on the captured lip sequence. The processor ofthe set-top box (e.g., user equipment) may determine an application withwhich the user is engaged. For example, the user may be watching a moviebeing payed from a DVD player. Thus, the application may be the outputof the audio and video stored on the DVD. The processor of the set-topbox (e.g., user equipment) may determine a current operating state ofthe application and determine commands that are applicable to thecurrent operating state. For example, if the movie is playing, the DVDplayer application commands may be to stop, fast forward, rewind, pause,etc. The processor of the set-top box (e.g., user equipment) maydetermine whether the generated reduction sequence matches a comparisonsequence associated with one of the determined commands. The processorof the set-top box (e.g., user equipment) may the DVD player (e.g.,device) by sending a command associated with a matching comparisonsequence to the application in response to determining that thegenerated reduction sequence matches the comparison sequence associatedwith one of the determined commands.

In some embodiments, the device (e.g., DVD player, video streaming appsuch as Netflix®, Amazon Prime®, etc.) may be one and the same as theelectronic device with which the user is engaged. For example, the DVDplayer or video streaming application may be integrated into thetelevision (e.g., electronic device). In such embodiments, the userequipment device (e.g., a set top box, an audio-video receiver, or asmart speaker) may determine the user's engagement with thedevice/electronic device and provide the control for thedevice/electronic device.

In other embodiments, the device, electronic device, and user equipmentdevice may also be one and the same. In such embodiments, thefunctionality of the device (e.g., DVD player, video streamingapplication, etc.) and the user equipment device (e.g., a set top box,an audio-video receiver, or a smart speaker) may be integrated into theelectronic device (e.g., television).

While the various embodiments herein may be discussed from theperspective of a fully integrated device (i.e., the device, electronicdevice and user equipment device being integrated into a single device),the various embodiments in which the functionality may be divided amonga plurality of devices is within the contemplated scope of disclosure.

Further aspects include a user equipment device (e.g. smart television,set top box, control system device, etc.) that includes a processorconfigured with processor-executable instructions to perform operationsof any of the methods summarized above. Further aspects include anon-transitory processor-readable storage medium having stored thereonprocessor-executable instructions configured to cause a processor in auser equipment device to perform operations of any of the methodssummarized above. Further aspects include a user equipment device havingmeans for accomplishing functions of any of the methods summarizedabove.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings, which are incorporated herein and constitutepart of this specification, illustrate example embodiments of variousembodiments, and together with the general description given above andthe detailed description given below, serve to explain the features ofthe claims.

FIG. 1A is a component block diagram of a smart television (TV) thatintegrates the functionality of a device, use equipment device andelectronic device into a single device suitable for implementing someembodiments.

FIG. 1B is a component block diagram of a smart television (TV) thatintegrates the functionality of a device and electronic device andincludes a user-equipment device that may control the device andelectronic device suitable for implementing some embodiments.

FIG. 1C is a component block diagram of a systems that includes adevice, an electronic device and a user-equipment device suitable forimplementing some embodiments.

FIGS. 2A and 2B are process flow diagrams illustrating methods of usinglip sequences to control the operations of an application or device inaccordance with some embodiments.

FIG. 3 is process flow diagram illustrating another method of using lipsequences to control the operations of an application or device inaccordance with some embodiments.

FIGS. 4A and 4B are illustrations of an example frame or data that maybe captured and analyzed by a device that configured to use lipsequences to control the operations of an application or device inaccordance some embodiments.

FIG. 5A is a chart illustrating that a device configured in accordancethe embodiments may use captured data points (e.g., A, B, C, D) todetermine/compute polygon shapes that each identifies a shape of theuser's lips at a point in time.

FIG. 5B is a chart illustrating the relationships between a capturedsequence, a reduction sequence, and a comparison (matching) sequence.

FIG. 6 is a component block diagram of a user equipment device in theform of smartphone that is suitable for implementing some embodiments.

FIG. 7 is a component block diagram of a user equipment device in theform of laptop that is suitable for implementing some embodiments.

FIG. 8 is a component diagram of an example server suitable for use withsome embodiments.

DETAILED DESCRIPTION

Various embodiments will be described in detail with reference to theaccompanying drawings. Wherever possible, the same reference numberswill be used throughout the drawings to refer to the same or like parts.References made to particular examples and implementations are forillustrative purposes, and are not intended to limit the scope of theclaims.

The terms “computing device,” “electronic device,” “user equipment (UE)device” and simply “device” may be used generically and interchangeablyherein to refer to any one or all of satellite or cable set top boxes,servers, rack mounted computers, routers, voice controllers, smarttelevisions, smart speakers, smart remote controls, smart locks, smartlighting systems, smart switches, smart plugs, smart doorbells, smartdoorbell cameras, smart air pollution/quality monitors, smart alarms(e.g., smoke alarms, security systems, etc.), smart thermostats, mediaplayers (e.g., DVD players, ROKU™, AppleTV™, etc.), digital videorecorders (DVRs), modems, routers, network switches, residentialgateways (RG), access nodes (AN), bridged residential gateway (BRG),fixed mobile convergence products, home networking adapters and Internetaccess gateways that enable users to access communications serviceproviders' services and distribute them around their house via a localarea network (LAN), tablet computers, personal computers, laptopcomputers, netbooks, ultrabooks, smartphones, mobile devices, cellulartelephones, palm-top computers, personal data assistants (PDA's),Internet-of-things (IOT) devices, smart appliances, personal or mobilemulti-media players, wireless electronic mail receivers, multimediaInternet enabled cellular telephones, gaming systems (e.g.,PlayStation™, Xbox™, Nintendo Switch™, etc.) head-mounted devices, andsimilar electronic devices which include a programmable processor, amemory and circuitry for sending and/or receiving wireless communicationsignals to/from wireless communication networks. While the variousembodiments are particularly useful in smart devices, such as smarttelevisions, the embodiments are generally useful in any electronicdevice that includes communication circuitry for accessing cellular orwireless communication networks.

In some embodiments, a device may be controlled by a user equipmentdevice after determining that the user is engaged with an electronicdevice. For example, the device may be a DVD player or gaming systemthat outputs content to a television (i.e., electronic device) that theuser is engaged with by watching the television. The user equipmentdevice may be, for example, a set top box, an audio-video receiver, or asmart speaker. In such embodiments, the user equipment device maydetermine whether a user is engaged with the television (e.g.,electronic device). The set-top box (e.g., user equipment) may capture alip sequence of the user for a period of time in response to determiningthat the user is engaged with the television (e.g., electronic device).The processor of the set-top box (e.g., user equipment) may generate areduction sequence based on the captured lip sequence. The processor ofthe set-top box (e.g., user equipment) may determine an application withwhich the user is engaged. For example, the user may be watching a moviebeing payed from a DVD player. Thus, the application may be the outputof the audio and video stored on the DVD. The processor of the set-topbox (e.g., user equipment) may determine a current operating state ofthe application and determine commands that are applicable to thecurrent operating state. For example, if the movie is playing, the DVDplayer application commands may be to stop, fast forward, rewind, pause,etc. The processor of the set-top box (e.g., user equipment) maydetermine whether the generated reduction sequence matches a comparisonsequence associated with one of the determined commands. The processorof the set-top box (e.g., user equipment) may the DVD player (e.g.,device) by sending a command associated with a matching comparisonsequence to the application in response to determining that thegenerated reduction sequence matches the comparison sequence associatedwith one of the determined commands.

In some embodiments, the device (e.g., DVD player, video streaming appsuch as Netflix®, Amazon Prime®, etc.) may be one and the same as theelectronic device with which the user is engaged. For example, the DVDplayer or video streaming application may be integrated into thetelevision (e.g., electronic device). In such embodiments, the userequipment device (e.g., a set top box, an audio-video receiver, or asmart speaker) may determine the user's engagement with thedevice/electronic device and provide the control for thedevice/electronic device.

In other embodiments, the device, electronic device, and user equipmentdevice may also be one and the same. In such embodiments, thefunctionality of the device (e.g., DVD player, video streamingapplication, etc.) and the user equipment device (e.g., a set top box,an audio-video receiver, or a smart speaker) may be integrated into theelectronic device (e.g., television).

While the various embodiments herein may be discussed from theperspective of a fully integrated device (i.e., the device, electronicdevice and user equipment device being integrated into a single device),the various embodiments in which the functionality may be divided amonga plurality of devices is within the contemplated scope of disclosure.

A number of different methods, technologies, solutions, and/ortechniques (herein collectively “solutions”) may be used for determiningthe location, position, or orientation of a target point (a point on thefacial structure surrounding the user's lips or eyes, corner of the lipsor eye, etc.), any or all of which may be implemented by, included in,and/or used by the various embodiments. Such solutions includetrilateration, multi-lateration, degrees of freedom (DOF), time ofarrival (TOA), time-of-flight (TOF), observed time difference of arrival(OTDOA), and angle of arrival (AOA). For example, a computing device maybe configured to transmit sound (e.g., ultrasound), light or a radiosignal to a target point, measure how long it takes for a reflection ofthe sound, light or radio signal to be detected by a sensor on thecomputing device, and use any or all of the above techniques (e.g., timeof arrival, angle of arrival, etc.) to estimate the distance and anglebetween the computing device and the target point.

The phrase “six degrees of freedom (6-DOF)” may be used herein to referto the freedom of movement of the head (or face, eyes, lips, etc.) ofthe user with respect to a UE device (e.g., smartphone, smart appliance,IoT device, etc.) in three-dimensional space or with respect to threeperpendicular axes. The user's head may change its position in aforward/backward direction or along the X-axis (surge), in a left/rightdirection or along the Y-axis (sway), and in an up/down direction oralong the Z-axis (heave). The head may change its orientation throughrotation along the three perpendicular axes. The term “roll” may referto rotation along the longitudinal axis or tilting side to side on theX-axis. The term “pitch” may refer to rotation along the transverse axisor tilting forward and backward on the Y-axis. The term “yaw” may referto rotation along normal axis or turning left and right on the Z-axis.

The terms “component,” “system,” “engine,” and the like may be usedherein to refer to a computer-related entity (e.g., hardware, firmware,a combination of hardware and software, software, software in execution,etc.) that is configured to perform particular operations or functions.For example, a component may be, but is not limited to, a processrunning on a processor, a processor, an object, an executable, a threadof execution, a program, and/or a computing device. By way ofillustration, both an application running on a computing device and thecomputing device may be referred to as a component. One or morecomponents may reside within a process and/or thread of execution and acomponent may be localized on one processor or core and/or distributedbetween two or more processors or cores. In addition, these componentsmay execute from various non-transitory computer readable media havingvarious instructions and/or data structures stored thereon. Componentsmay communicate by way of local and/or remote processes, function orprocedure calls, electronic signals, data packets, memory read/writes,and other known computer, processor, and/or process relatedcommunication methodologies.

Computing device interfaces (e.g., Human Computer Interface, etc.) arebecoming increasingly complex, particularly for smart appliances and IoTdevices. Complex functionality results in a multitude or a largeplurality of possible commands and functions for the user to execute.Users find it increasingly challenging to access the full functionalityprovided by these devices/systems. In addition to diminishing the userexperience, these increasing complex interfaces pose a challenge forservice providers and system manufacturers because the training cycleassociated with these corresponding devices is becoming longer, morecomplex, and more expensive. The increase in complexity may alsoincreases the support costs (e.g., for onboarding existing or new users,etc.) associated with software or hardware upgrades to these devices.

Some control systems, such smart universal remote controls and smartspeakers, include built-in microphones that accept user voice input.While these systems allow users to more fully utilize the functionalityprovided by their devices, they also include significant limitations anddisadvantages. For example, a smart speaker system (e.g., Siri, Alexa,etc.) may capture voice recordings, use Artificial Intelligence totranslate words within the recording into text, translate the text intocommands, and send the commands to other devices (e.g., TVs,thermostats, mobile devices, etc.). Such a system also commonly includesa supervised learning stage in which a human manually analyzes therecording to identify commands (e.g., to account for individual orregional differences in how certain words are pronounced, etc.). Privacyconcerns start to outweigh the advantage and convenience of voicecommands as such devices must constantly “listen” for potentialcommands. As a result, users may become guarded from using such devicesfor fear that all of their conversations will be recorded andtransmitted across a communication network. While such smart universalremote controls and smart speakers may eventually delete thedata/recordings, users are unable to readily control what gets recorded,who listens to the recording, who reviews the data, how long therecording and/or its associated data is kept, whether the recordingscontain private conversations or information that the user does not wantto share, etc.

Further, new and emerging data protection and privacy laws, such asGeneral Data Protection Regulation (EU) 2016/679 (GDPR), may limit thetypes of information that may be recorded and sent to a cloud server foranalysis. As more and more of these laws are implemented, systems thatcapture and evaluate voices (e.g., to detect wake-up expressions, usercommands, etc.) could create liabilities for their users, vendors ordevice manufacturers.

In addition, a user might not have access to or wish to utilize audiblevoice commands in a public setting (e.g., airplane, bus, train, taxi,etc.). Even if they do, there are courtesy and privacy issues around theuser sending voice command to a screen when there are other individualsnearby (e.g., sitting next to the user in an airplane, etc.). Forexample, it may be rude or annoying to an individual sitting next to theuser if the user provides detailed instructions (e.g., open, play, stop,menu, etc.) to their personal electronic device (e.g., smartphone,tablet, etc.). For these and other reasons, control systems such assmart speakers that capture or use the user's voice/words have not been,or will not be, fully utilized by consumers to control their devices.

Some control systems may include a “night mode” that allows users tointeract with the smart universal remote controls and/or smart speakerswithout audible speech and/or in low light conditions. A control systemthat supports “night mode” may capture a user's lip (and/or mouth)movements, translate the user's lip movements to words, store the wordsin memory, compare the words to commands, and apply the correspondingcommand to the device. Because these systems do not determine thecontext in which a command is given (e.g., while the user is facing asmart TV and watching a movie, etc.), there could be thousands ofrelevant applications and millions of relevant commands. Determining thespecific command that was issued by the user could consume an excessiveamount of the device's often limited processing, memory or batteryresources. Alternatively, the control system could send the words toserver in a cloud network for analysis. However, the transmission of thecommands through a cloud communication network could be a violation ofthe user's privacy and/or violate a data protection or privacy law(e.g., GDPR).

The embodiments disclosed herein include control systems, which may beimplemented as stand-alone devices or within another computing device oruser equipment device (e.g., cable box, smart TV, etc.), that overcomethe above described limitations of existing or conventional solutions.Control systems configured in accordance with the embodiments may notrequire the inclusion or use of a microphone, may not be required tocapture voice recordings, may not be required to translate voice totext/commands, may not be required to translate lip movements intowords, may not be required to store the user's words in memory, and maynot be required to transmit information collected from the user to aserver or cloud network. As a result, control systems configured inaccordance with the embodiments may allow users to more fully access thefunctionality provided by modern electronic devices and control systemswithout compromising their privacy or data security and/or withoutviolating any existing or foreseeable data protection or privacy laws.

A control system configured in accordance with the embodiments may beconfigured to scan its physical environment (e.g., by activating asensor, using LIDAR, etc.) to determine whether a user is engaged withan electronic device. The electronic device may be a UE device in whichthe control system is included or another UE device that iscommunicatively coupled to the control system. In response todetermining that the user is engaged, the control system may captureuser's lip sequence (this may also be referred to as the user's mouthsequence) over a timeline, generate one or more captured sequences basedon the user's captured lip sequences, and/or generate one or morereduction sequences. For example, the control system may transmit LIDARsignals, capture their reflections off certain points on the user's face(e.g., areas surrounding the user's mouth or lips, etc.), use capturedreflections to identify edge points of the user's lips, determine theangles between the edge points, compute polygons based on the edgepoints and angles captured during the timeline, generate a capturedsequence based on the computed polygons, and generate a reductionsequence based on the captured sequence.

The control system may be configured identify or determine theapplication (e.g., video player, heating, ventilation, and airconditioning (HVAC) controller, etc.) with which the user is engaged,determine the current operating state of the application or relevantdevice (e.g., is playing movie, is cooling, etc.), determine thecommands (e.g., “PAUSE,” “STOP,” “FAN OFF,” etc.) that are applicable tothe current operating state of the application/device, and compare areduction sequence to the comparison sequences associated with thedetermined commands. In response to determining that a reductionsequence matches a comparison sequence, the control system may select acommand associated with a matching comparison sequence, and send theselected command to the application/device to control its operations.

Each captured sequence, reduction sequence, and/or comparison sequencemay be an information structure that characterizes a sequence of lipmovements over a timeline. These information structures may include anyor all of edge point values, vertex values, corner values, side values,angle values, capture time values, polygon values, polygon mesh values,polygon models, and/or frames. For example, the information structuresmay include four (4) edge points and angles that may be used todetermine vertices/corners and edges/sides of a polygon shape (e.g., aquadrilateral shape, etc.). Each polygon shape may correspond to a framethat identifies a shape of the user's lips at a point in time. Aseries/sequence of these polygon shapes/frame over a period of time(e.g., 0.5 seconds, 0.75 seconds, 1 second, 1.25 seconds, etc.) mayidentify a pattern of lip/mouth movements that may correspond to acommand.

A reduction sequence may include a subset of the information included ina captured sequence. For example, if a captured sequence includespolygons or frames that identify the shape of the user's lip/mouth atevery millisecond, the corresponding reduction sequence may include onlythe polygons or frames that identify significant changes in the shape ofthe user's lip/mouth. A comparison sequence may include reference ortest values (e.g., a sequence of reference polygons, etc.) and anassociated command (e.g., play, pause, cool, toast, etc.). Thecomparison values may be generated in a server computing device andpreloaded/downloaded/stored on a local memory associated with thecontrol system.

A control system configured in accordance with the embodiments mayanalyze a user's lip/mouth movement without audible commands and/or in adarkened space or total darkness. For example, the control system mayuse LIDAR to track specific points within the user's face, and analyzethe input data points to generate output commands within dynamiccontext. While the user might speak naturally, the control system doesnot capture, store or process the user's audible voice. In addition, allthe processing is handled locally (on the control system device itself),thereby providing full privacy protection to the end user. None of thecaptured user commands or audio may be transmitted via a cloudcommunication network. Thus, any user data remains within the user'simmediate environment.

FIG. 1A illustrates components in a user equipment device in the form ofa smart television 100 that includes a control system that may beconfigured to use lip sequences to control its operations in accordancewith some embodiments. In the embodiments illustrated in FIG. 1, thefunctionality of the device, user equipment device and electronic devicemay be integrated into the singular smart television 100. In the exampleillustrated in FIG. 1, the smart television 100 includes a processor 102coupled to internal memory 104, an electronic display 106, a facialsensor 108, an image sensor 110, sensor array 112, speakers 114,communications circuitry 116 (e.g., transceiver, a wireless radio,etc.), user interface elements 118 (e.g., lights, buttons, etc.), and anantenna 120 for sending and receiving electromagnetic radiation and/orconnecting to a wireless data link.

In some embodiments, the facial sensor 108 may include a LIDAR camerathat uses laser pulses to provide a three-dimensional representation ofobjects in front of the smart television 100. In some embodiments, theprocessor 102 may be digital signal processor (embedded chip) configuredto provide real-time processing of captured scans from the LIDAR camera.In some embodiments, the memory 104 may be an embedded storage systemthat fits within a chip. In some embodiments, the storage capacity ofthe memory 104 may be adjusted to handle anticipated patterns (e.g.,comparison sequences, etc.) that are used for classification. In someembodiments, the communications circuitry 116 and/or antenna 120 may beused to communicate with a server in a cloud network that is configuredto aggregate and generate a list of patterns used to match specificwords or other pre-set 2D plot templates. The aggregated data may beused to optimize all collected patterns and send the optimized (smaller)list to the smart television 100, which may receive the optimized(smaller) list via the communications circuitry 116 and/or antenna 120and store the received list in the memory 104.

The facial sensor 108 may be configured to acquire data from the user'sfacial features (e.g., the user eyes, noise, lips, jaw, etc.), and sendthe acquired data to the sensor array 112 and/or processor 102. As partof these operations, the facial sensor 108 may produce or transmit anyof a variety of signals, including any or all of sound navigationranging (sonar) signals, radio detection and ranging (radar) signals,light detection and ranging (lidar) signals, sound waves (e.g.,ultrasound from a piezoelectric transducer, etc.), small flashes oflight (e.g., infrared light, light from an light emitting diode (LED)laser, etc.), and/or any other similar signal, wave, or transmissionknown in the art. The facial sensor 108 may capture the signals'reflections off one or more points on a surface of the user's face, andsend the corresponding data to the sensor array 112 and/or processor102.

The sensor array 112 and/or processor 102 may be configured to use thedata received from the facial sensor 108 to isolate the user's faceand/or determine whether the user intends to engage with the smarttelevision 100. For example, the processor 102 may use the received datato determine the position, orientation or direction of the user's head,face or lips and/or to determine the gaze direction the user's eyes. Theprocessor 102 may determine that the user intends to engage with thesmart television 100 based on the user's gaze direction (e.g., inresponse to determining that the user is looking directly at theelectronic display 106, etc.), direction of the user's head, position ofthe user's face, orientation of the user's lips, or any combinationthereof. The sensor array 112 and/or processor 102 may also use the datareceived from the facial sensor 108 to generate a captured sequence.

In some embodiments, the processor 102 may be configure to use datareceived from the facial sensor 108 (e.g., scans from the LIDAR camera)to locate a face within the captured three-dimensional frames. In someembodiments, the facial sensor 108 may be coupled to a custom onboardchipset that could be used for fast detection.

The image sensor 110 may be configured to capture real-world images fromthe smart television's 100 physical environment, and send thecorresponding image data to the processor 102. In some embodiments, theprocessor 102 may be configured to use localization and mappingtechniques, such as simultaneous localization and mapping (SLAM), visualsimultaneous localization and mapping (VSLAM), and/or other techniquesknown in the art to construct a map of the viewable environment,identify the user's face within the constructed map, and/or determinedistances and angles between the user's face/lips and the image sensor110 or smart television 100. The processor 102 may use the imageinformation and the determined distances/angles to generate a capturedsequence information structure and/or determine whether the user intendsto engage with the smart television 100 to issue commands.

In some embodiments, the image sensor 110 may include a monocular imagesensor that captures images or frames from the environment surroundingthe image sensor 110. The processor 102 may receive the captured images,identify prominent objects or features within the captured image,estimate the dimensions and scale of the features in the image, comparethe identified features to each other and/or to features in test imageshaving known dimensions and scale, and identify correspondences based onthe comparisons. Each correspondence may be a value set or aninformation structure that identifies a feature (or feature point) inone image as having a high probability of being the same feature inanother image (e.g., a subsequently captured image). The processor 102may use the identified correspondences to determine distances and anglesbetween the user's face/lips and the image sensor 110 or smarttelevision 100. The processor may generate the captured sequenceinformation structure based on determined distances and angles betweenthe user's lips.

The sensor array 112 may include, or may be coupled to, any or all ofthe processor 102, facial sensor 108, image sensor 110, eye-trackingsensor, an infrared (IR) sensor, an inertial measurement unit (IMU), alaser distance sensor (LDS), and an optical flow sensor, and/or othersensors configured to detect the presence of a user's face, theposition, orientation or direction of user's face, movements of facialfeatures (e.g., eye, jaw, lip, etc.), the distance between the user'sface and the smart television 100, etc. For example, the sensor array112 may include an optical flow sensor that measures optical flow orvisual motion, and outputs measurements based on the optical flow/visualmotion. An optical flow may identify or define the pattern of apparentmotion of objects, surfaces, and edges in a visual scene caused by therelative motion between an observer and a scene. The sensor array 112may generate and send an optical flow or its associated measurements tothe processor 102, which may receive and use the information for motiondetection, object segmentation, time-to-contact information, focus ofexpansion calculations, motion compensated encoding, stereo disparitymeasurements, and/or other similar computations or techniques. Theprocessor 102 may use any or all such computations or techniques todetermine whether user intends to engage with the smart television 100and/or to generate captured sequence information structures.

In various embodiments, the sensor array 112 may include any or all of agyroscope, an accelerometer, a magnetometer, a magnetic compass, analtimeter, an odometer, a pressure sensor, an optical reader, amonocular image sensor, sensors for scanning/collecting information fromthe user's environment (e.g., room, etc.), geo-spatial positioningsensors (e.g., global positioning system (GPS) transceiver, etc.),sensors for monitoring physical conditions (e.g., location, motion,acceleration, orientation, altitude, etc.), distance measuring sensors(e.g., a laser, sonic range finder, etc.), orientation sensors (e.g.,up, down, level, etc.), and other sensors that detect motion, gestures(e.g., hand movements) and/or lip movements. The processor 102 may beconfigured to use any or all such information collected by the sensorarray 112 to determine whether the user is viewing or paying attentionto the electronic display 106 (e.g., via information collected from acamera, motion sensor, etc.), whether the user is in close proximity tothe smart television 100, or whether the user is engaged in an activity(e.g., moving, entering text in a different device, making a voice call,etc.) that indicates the user is not actively engaged with the smarttelevision 100. The processor 102 may also be configured to use any ofall of the information received from the sensor array 112 to generate acaptured sequence information structure.

The communications circuitry 116 may be coupled to the processor 102 andconfigured to establish data connections with a network, such as a localarea network, a service provider network, a cloud network or theInternet. The smart television 100 may communicate with other devicesvia a direct communication link (e.g., wireless data link, etc.),through a central server, via short-range radio technologies (e.g.,Bluetooth®, WiFi, etc.), via peer-to-peer connections, or via any otherknown communication technologies. The processor 102 may be configured toreceive comparison sequences from the network (e.g., cloud network,Internet, etc.) via the communications circuitry 116 and/or antenna 120.Each comparison sequence may be an information structure that includesvalues that identify edge points and angles that form one or morerectangular shapes. Each comparison sequence may include may beassociated with a command (e.g., open, start, pause, cool, submit,etc.).

The user interface elements 118 may include indicator lights that showwhether the processor 102 has determined that the user is currentlyengaged with the smart television 100. For example, a user interfaceelement 118 may turn green to indicate that the processor 102 hasdetermined that the user is currently engaged, and is ready to collectlip movement information to identify commands. The user interfaceelement 118 may turn black or off to indicate that the processor 102 hasdetermined that the user is not currently engaged, the processor 102 isnot collecting lip movement information, and/or that the processor 102is not currently attempting to identify commands based on the user's lipmovements. The user interface element 118 may turn red to indicate thatthere has been an error, orange to indicated that the user is not linedup properly with respect to the smart television 100, etc.

The smart television 100 or its control system (e.g., processor 102,facial sensor 108, sensor array 112, etc.) may be equipped with, coupledto, or communicate with a variety of additional sensors, including agyroscope, accelerometers, a magnetometer, a magnetic compass, analtimeter, a camera, an optical reader, an orientation sensor, amonocular image sensor, and/or similar sensors for monitoring physicalconditions (e.g., location, motion, acceleration, orientation, altitude,etc.) or gathering information that is useful for employing SLAMtechniques.

In an embodiment smart televisions 100 as illustrated in FIG. 1A, theprocessor 102 in the smart television 100 may be configured to controlthe smart television 100 by determining whether a user is engaged withthe smart television 100, capturing (e.g., by working in conjunctionwith the facial sensor 108, image sensor 110, sensor array 112, etc.) alip sequence of the user for a period of time in response to determiningthat the user is engaged with the smart television 100, generating areduction sequence based on the captured lip sequence, determining anapplication with which the user is engaged, determining a currentoperating state of the application, determining commands that areapplicable to the current operating state, determining whether thegenerated reduction sequence matches a comparison sequence associatedwith one of the determined commands, and sending a command associatedwith a matching comparison sequence to the application in response todetermining that the generated reduction sequence matches the comparisonsequence associated with one of the determined commands.

FIGS. 1B and 1C illustrate communication networks 140, 160 that includea control system device 142 that could be configured to implement someembodiments. In particular, FIGS. 1B and 1C illustrates that theprocessor 102, internal memory 104, facial sensor 108, image sensor 110,sensor array 112, communications circuitry 116 and an antenna 120discussed above with reference to FIG. 1A may be implemented as part ofa separate control system device 142. The control system device 142 maybe an independent “stand-alone” control system or implemented as part ofanother user equipment device (e.g., satellite or cable set top box,smart speaker system, smart controller, etc.) that is separate from thetelevision 144 or the component that includes the electronic display 106with which the user engages.

In the example illustrated in FIG. 1B, the control system device 142(i.e., user equipment device) includes indirect or direct (wired orwireless) communication links to the television 144 (e.g., device andelectronic device integrated into a singular device) and access point146. For example, the control system device 142 may send and receiveinformation to and from the television 144 directly or indirectly viathe access point 146.

With reference to FIG. 1B, the processor 102 in the control systemdevice 142 (i.e., user equipment device) may be configured to controlthe television 144 by determining whether a user is engaged with thetelevision 144 (or electronic display 106 of the television 144),capturing a lip sequence of the user (e.g., by working in conjunctionwith the facial sensor 108, image sensor 110, sensor array 112, etc.)for a period of time in response to determining that the user is engagedwith the television 144, generating a reduction sequence based on thecaptured lip sequence, determining an application with which the user isengaged, determining a current operating state of the application,determining commands that are applicable to the current operating state,determining whether the generated reduction sequence matches acomparison sequence associated with one of the determined commands, andsending a command associated with a matching comparison sequence to theapplication in response to determining that the generated reductionsequence matches the comparison sequence associated with one of thedetermined commands.

In some embodiments the control system and the device to be controlledmay be integrated into a single device. For example, the control systemmay be integrated into a cable set top box that is coupled to atelevision. In these embodiments, sending the command associated withthe matching comparison sequence to the application may include sendingthe command to an application (e.g., media player, etc.) operating onthe control system device 142.

In other embodiments (e.g., embodiments in which the control system isimplemented as an independent “stand-alone” control system, etc.),sending the command associated with the matching comparison sequence tothe application may include sending the command to the television 144for controlling an application (e.g., media player, etc.) operating onthe television 144.

In the example illustrated in FIG. 1C, the control system device 142(i.e., user equipment device) includes indirect or direct (wired orwireless) communication links to a cable set top box 148 (i.e., deviceto be controlled or simply “device”), the television 144, and the accesspoint 146. Further, each of the of the components in the communicationnetwork 160 (e.g., control system device 142, television 144, accesspoint 146 and cable set top box 148) include direct and/or indirect(wired or wireless) communication links every other component in thecommunication network 160. As such, each component (e.g., 142-148) maysend and receive information to and from every other component (e.g.,142-148) directly or indirectly (e.g., through the access point 142,etc.).

With reference to FIG. 1C, the processor 102 in the control systemdevice 142 (i.e., user equipment device) may be configured to controlthe cable set top box 148 (i.e., device) by determining whether a useris engaged with the television 144 (i.e., electronic device) and/orcable set top box 148 (which could be based on user preferences orsettings), capturing a lip sequence of the user (e.g., by working inconjunction with the facial sensor 108, image sensor 110, sensor array112, etc.) for a period of time in response to determining that the useris engaged, generating a reduction sequence based on the captured lipsequence, determining an application with which the user is engaged(e.g., media player operating on the cable set top box 148, etc.),determining a current operating state of the application, determiningcommands that are applicable to the current operating state, determiningwhether the generated reduction sequence matches a comparison sequenceassociated with one of the determined commands, and sending a commandassociated with a matching comparison sequence to the application inresponse to determining that the generated reduction sequence matchesthe comparison sequence associated with one of the determined commands.

In various embodiments, sending the command associated with the matchingcomparison sequence to the application may include sending the commandto the cable set top box 148 and/or to the television 144, eitherdirectly or via the access point 146.

FIG. 2A illustrates a method 200 of using lip sequences to control theoperations of a device (e.g., smart television 100 illustrated in FIG.1A, television 144 illustrated in FIG. 1B, cable set top box 148illustrated in FIG. 1C, etc.) in accordance with some embodiments.Method 200 may be performed by a processor (e.g., 102) that is includedin, or communicatively coupled to, the device that is controlled (e.g.,smart television 100 illustrated in FIG. 1A, television 144 illustratedin FIG. 1B, cable set top box 148 illustrated in FIG. 1C, etc.). Method200 may be performed after the processor downloads from a cloud serveror otherwise receives and stores comparison sequences in a datastore.Each of the comparison sequences may be an information structure thatincludes edge point values, vertex values, corner values, side values,angle values, capture time values, and/or polygon values. In addition,each comparison sequence may be associated with a device, application,and/or command.

In block 202, the processor (e.g., 102) may scan its physicalenvironment (e.g., by activating the facial sensor 108, image sensor110, using LIDAR, etc.) and use facial recognition techniques todetermine whether a user's face is present within the vicinity (e.g., inthe same room as, etc.) of the device. For example, the processor maycause a facial sensor 108 to produce or transmit signals, capture theirreflections, determine based on the captured reflections whether thereis an object in close proximity (e.g., within 10 feet, etc.) to thedevice, and determine whether any detected object resembles a user'sface. As another example, the processor may capture an image of thedevice's surrounding environment, analyze the captured image to identifyprominent features within the captured image, and compare the identifiedfeatures to test images of a human face, and determine that a user's ispresent within the vicinity of the device based on the comparisons.

In block 204, the processor may compute or determine the position,orientation, and/or direction (e.g., gaze direction, etc.) of user'sface, eyes or lips. For example, the processor may transmit LIDARsignals, capture their reflections, and use captured reflections todetermine the position, orientation, and/or direction of user's face,eyes or lips. As another example, the processor may capture an image ofthe device's surrounding environment, analyze the captured image toidentify prominent features within the captured image, compare theidentified features to test images to estimate the dimensions and scaleof the features in the image, compare the identified features to eachother and/or to features in test images having known dimensions andscale, identify correspondences based on the comparisons, and use theidentified correspondences to determine the position, orientation,and/or direction of user's face, eyes or lips.

In determination block 206, the processor may determine whether the useris engaged with an electronic device (e.g., smart television 100,television 144, cable set top box 148, etc.) based on the user's gazedirection, the orientation of the user's head, the position of theuser's eyes, and/or any other technique known in the art or discussed inthis application. In response to determining that the user is notengaged (i.e., determination block 206=“No”), the processor may continueto continuously or periodically scan its physical environment anddetermine whether a user's face is present in block 202.

In response to determining that the user is engaged (i.e., determinationblock 206=“Yes”), the processor may identify or determine theapplication (e.g., video player, HVAC control system, etc.) with whichthe user is engaged in block 208. For example, the processor maydetermine that the user is engaged with a specific type of video player(e.g., Apple TV, YouTube, etc.) that is operating in the foreground of asmart TV device in block 208.

In block 210, the processor may determine the current operating state(e.g., is playing movie, is cooling, etc.) of the application or itsassociated device. In block 212, the processor may determine thecommands (e.g., “PAUSE,” “STOP,” “FAN OFF,” etc.) that are applicable tothe determined current operating state. In block 214, the processor mayuse the determined commands to query a datastore of comparisonsequences, and select a small subset of the stored comparison sequencesthat correspond to the determined commands.

In block 216, the processor may capture user's lip sequence over atimeline and generate one or more captured sequence informationstructures. The processor may capture the user's lip sequence and/orgenerate the captured sequence information structures using any or allof the techniques disclosed in this application or known in the art. Forexample, the processor may transmit LIDAR signals, capture theirreflections off certain points on the user's face (e.g., areassurrounding the user's mouth or lips, etc.), use captured reflections toidentify edge points of the user's lips, determine the angles betweenthe edge points, compute a polygon based on the edge points and angles,and generate the captured sequence information structure based on thecomputed polygon.

In block 218, the processor may generate a reduction sequence based onthe user's lip sequences or the generated captured sequence informationstructures. The reduction sequences may include a subset of theinformation included in the captured sequences. For example, if thecaptured sequences include polygons or image frames that identify theshape of the user's mouth at every millisecond, the reduction sequencesmay include polygons or image frames that that identify significantchanges in the shape of the user's mouth.

In some embodiments, the processor may generate a reduction sequence inblock 218 by selecting a polygon value of a first record included in ansequentially ordered list of captured sequence information structures,adding the selected polygon value and its capture time to a list ofreduction sequences, and traversing the sequentially ordered list ofcaptured sequence information structures to determine whether thedifferences between the selected polygon value and the polygon values ofsubsequent records exceed a threshold. In response to determining thatdifferences between the selected polygon value and a polygon value of asubsequent record exceeds the threshold, the processor may add thepolygon value and it's capture time as the next record in the list ofreduction sequences. The processor may select the most-recently addedpolygon value and repeat the operations above until all the capturedsequences have been evaluated.

In determination block 220, the processor may determine whether thegenerated reduction sequence matches any of the comparison sequencesselected in block 214. In response to determining that the generatedreduction sequence does not match any of the selected comparisonsequences (i.e., determination block 220=“No”), the processor maycontinue to continuously or periodically scan its physical environmentand determine whether a user's face is present in block 202. In responseto determining that the generated reduction sequence matches one of theselected comparison sequences (i.e., determination block 220=“Yes”), theprocessor may select the command associated with the matching comparisonsequence in block 222. In block 224, the processor may send the selectedcommand to the device and/or application.

FIG. 2B illustrates another method 250 of using lip sequences to controlthe operations of a device (e.g., smart television 100 illustrated inFIG. 1A, television 144 illustrated in FIG. 1B, cable set top box 148illustrated in FIG. 1C, etc.) in accordance with some embodiments.Method 250 may be performed by a processor (e.g., 102) that is includedin, or communicatively coupled to, the device that is controlled (e.g.,smart television 100 illustrated in FIG. 1A, television 144 illustratedin FIG. 1B, cable set top box 148 illustrated in FIG. 1C, etc.). Method250 may be performed after the processor downloads from a cloud serveror otherwise receives and stores comparison sequences in a datastore.

In block 252, the processor may determine whether a user is engaged withan electronic device. For example, the processor may work in conjunctionwith the facial/image sensors of the electronic device to determine thatthe user is looking towards an electronic display of the electronicdevice, and thus that the user is engaged with the electronic device. Inblock 254, the processor may capture a lip sequence of the user for aperiod of time in response to determining that user is engaged with theelectronic device. For example, the processor may work in conjunctionwith a facial sensor to transmit LIDAR signals towards a face of a user,capture reflections of the LIDAR signals off points on the face of theuser, use the captured reflections to identify points (and angles) onlips of the user, and determine a polygon based on the identified points(and angles).

In some embodiments, as part of the operations in block 254, theprocessor may generate a captured sequence information structure basedon the determined polygon and/or to include the determined polygon. Asdiscussed above, a captured sequence information structure maycharacterize a sequence of lip movements over a timeline via a pluralityof values, which may include any or all of edge point values, vertexvalues, corner values, side values, angle values, capture time values,polygon values, and/or frames.

In block 256, the processor may generate a reduction sequence based onthe captured lip sequence. For example, the processor may generate thereduce sequence based on the generated captured sequence informationstructure generated in block 254. In block 258, the processor maydetermine an application with which the user is engaged. For example,the processor may select an application (e.g., video player, audioplayer, web-app, etc.) operating in the foreground of the electronicdevice (or displayed on the foreground of the electronic display,running on the processor, consuming the most processor cycles, issuingapplication programming interface (API) calls to display component,etc.).

In block 260, the processor may determine a current operating state ofthe application (e.g., is playing movie, displaying a media guide,paused on audio, etc.). In block 262, the processor may determinecommands that are applicable to the application in its current operatingstate. For example, if the application is a media player, its associatedcommands may include any or all of play, pause, toggle, reverse, rewind,fast forward, stop, off, exit, skip to the start or previousfile/track/chapter, skip to the end or next file/track/chapter, record,eject, shuffle, repeat, info, menu, guide, reload, and refresh. If themedia player is in a paused operating state, the commands that areapplicable to the current operating state may simply include play, stop,off, rewind, and fast forward.

In block 264, the processor may determine whether the generatedreduction sequence matches a comparison sequence associated with one ofthe determined commands. The processor may determine that a reductionsequence matches a comparison sequence in response to determining thatall or many of the polygons in the reduction sequence are the same orsimilar to their corresponding polygons (matched up in time or sequence)in the comparison sequence. In some embodiments, the processor may beconfigured to determine that a first polygon (e.g., in a reductionsequence) is similar to a second polygon (e.g., in a comparisonsequence) in response to determining that their corresponding angles arecongruent and the measures of their corresponding sides areproportional. In block 266, the processor may send a command associatedwith a matching comparison sequence to the application in response todetermining that the generated reduction sequence matches a comparisonsequence associated with one of the determined commands.

As part of the operations in blocks 202-224 and 252-266, the processormay use any number of different solutions (e.g., trilateration, DOF,TOF, AOA, etc.) to determine distances and the location, position,orientation, etc. of the user's head or facial features. For example,the processor may determine a distance between the computing device (orthe controlled device) and the head, face, lips or eyes of the userbased on a TOF measurement. As another example, the processor mayperform a 6-DOF computation based on the captured reflections todetermine a surge, sway, heave, roll, pitch and yaw of the user's heador facial features (e.g., eyes, lips, etc.). The processor may use thesecomputations (e.g., surge, sway, heave, roll, pitch and yaw) todetermine the precise orientation of the user's head, determine whetherthe user is engaged with the electronic device (e.g., in blocks 206,252, etc.), to process the collected lip sequence data, to generate thepolygons or polygon meshes (e.g., a collection of vertices, edges andsides that define a polyhedral object in three dimensional computergraphics, etc.), to generate the captured sequence informationstructures, to adjust the lip sequence or polygon data when comparingthe captured or reduction sequences to the comparison sequences (e.g.,in blocks 220, 264, etc.).

FIG. 3 illustrates yet another method 300 of using lip sequences tocontrol the operations of a device (e.g., smart television 100illustrated in FIG. 1A, television 144 illustrated in FIG. 1B, cable settop box 148 illustrated in FIG. 1C, etc.) in accordance with someembodiments. Method 300 may be performed by a processor (e.g., 102) thatis included in, or communicatively coupled to, the device that iscontrolled (e.g., smart television 100 illustrated in FIG. 1A,television 144 illustrated in FIG. 1B, cable set top box 148 illustratedin FIG. 1C, etc.). Method 300 may be performed after the processordownloads from a cloud server or otherwise receives and storescomparison sequences in a datastore.

In block 302, the processor (e.g., 102) may use scans from a LIDARsensor/module to locate a face within captured three-dimensional frames.In block 304, the processor may use the LIDAR data to identify/detect alip area of the user. To improve performance, the processor may captureor process only a minimal number of points in blocks 302 and/or 304. Forexample, four points could be used (edges and mid-point for upper andlower lips) to determine the lip area.

In block 306, the processor may determine a sequence position for theuser's lips. For example, the processor may determine that the system isat a start position when it starts to capture lip frames. The processormay tag a frame as an end position when the capture sequence isclassified as a pre-set command. All sequences in-between may be storedin a local database with a pre-set threshold. If no command can beextracted, the processor may free up the memory and reset the sequence.

In block 308, the processor may analyze the sequence. In someembodiments, the processor may analyze the sequence so as to save powerand achieve optimum performance. For example, the processor may capturea sequence of snap shots of the lip through a timeline, drop any framethat does not differ from the prior one by a preset threshold (e.g., toensure only a minimal number of frames are captured and laterprocessed), and classify the remaining captured frames. The processormay classify the frames by comparing them with a preset number ofsequences stored in the local memory/database. The preset sequences maybe downloaded from a cloud network on a periodic basis. The presetsequences used by the processor may be within a pre-determined contextlinked to the displayed screen. For instance, if the screen is playing amovie, then the preset sequences used by the processor would onlyinclude: “PAUSE”, “STOP”, “MENU”, etc. This may greatly reduce thenumber of templates/sequences that processor evaluates for theclassification, thereby reducing power consumption and improvingprocessing/classification times.

In block 310, the processor may validate the classification, sequenceand/or command. The validation operation may ensure that the capturedsequence corresponds to a command that is valid within the context. Ifvalidation fails, the processor may reject the sequences and reset. Ifthe validation is successful, the processor may generate the command.

In block 312, the processor (or a corresponding device processor) mayimplement the generated command. For example, the processor may send a“PAUSE”, “STOP”, or “MENU” command to a video system to be processed.

Although methods 200 and 300 may be performed on the ‘Edge’ where allprocessing is handled on the control system/smart device, there arecertain operations that could be handled on the cloud. For example, aserver in a cloud network may store and transmit a pre-set of sequencesto the Edge devices so they can be used locally. In addition, a deviceconfigured in accordance with the embodiments may transmit certainsequences to the cloud network for use in further enhancing theexperience. For example, a server in the cloud network may use the datareceived from the device to eliminate duplicate sequences, which couldreduce the size of the downloaded dataset.

Although the system can operate without user training, a manual trainingprocess may be triggered due to variance in regional dialects and evenlanguages. A control system/smart device configured in accordance withthe embodiments may trigger this process when the user selects an optioncorresponding to this task. The collected data could also be sent to thecloud to enhance the performance of the system by enabling the system toaccurately read commands from other users who speak the same way.

FIGS. 4A and 4B illustrate an example frame 402 that may be captured bya control system/smart device configured in accordance the embodiments.FIG. 4A illustrates that the control system/smart device may locate auser's face area 404 within a captured frame 402, and locate eye areas406 and/or a lip area 408 within the face area 404. FIG. 4B illustratesthat the device may capture or process number of points (e.g., fourpoints A, B, C, D) that represent the lips of the user. In the exampleillustrate in FIG. 4B, the points represent the edges of the lips (e.g.,points A and C), the mid-point for upper lip (e.g., point B), and themid-point for lower lip (e.g., point D).

FIG. 5A illustrates that a device configured in accordance theembodiments may use the points (e.g., A, B, C, D) representing the lipsof the user to determine/compute a polygon shape 502 (e.g., aquadrilateral shape, etc.) that identifies a shape of the user's lips ata point in time (e.g., t=0.00, etc.). A captured sequence 504 of polygonshapes over a period of time (e.g., 0.8 seconds in FIG. 5A) may identifya pattern of captured lip movements that could correspond to a command.

FIG. 5B illustrates the relationship between the captured sequence 504,a reduction sequence 506, and a comparison sequence 508. In particular,FIG. 5B illustrates that the reduction sequence 506 may include a subsetof the polygons included in the captured sequence 504. For example, thecaptured sequence 504 includes polygons that identify the shape of theuser's mouth at 0.0, 0.2, 0.4, 0.6 and 0.8 seconds. The correspondingreduction sequence 506 includes only the polygons that identify theshape of the user's mouth at 0.0, 0.4, and 0.8 seconds, each of whichidentifies a significant change in the shape of the user's mouth. Thecomparison sequence 508 include reference or test values (e.g., asequence of reference polygons, etc.) to which the reduction sequence506 may be compared.

Various embodiments may be implemented on a variety of computingdevices, an example of which in the form of a smartphone 600 isillustrated in FIG. 6. A smartphone 600 may include a processor 601coupled to a facial sensor, an image sensor, a sensor array, and othercomponents so that the processor 601 perform any of the processing inany of the methods 200 or 300. In the example illustrated in FIG. 6, theprocessor 601 is coupled to an internal memory 602, a display 603, aspeaker 604, an antenna 605, and a wireless transceiver 606 for sendingand receiving wireless signals. Smartphones 600 typically also includemenu selection buttons or rocker switches 607 for receiving user inputs.

A typical smartphone 600 also includes a sound encoding/decoding (CODEC)circuit 608, which digitizes sound received from a microphone into datapackets suitable for wireless transmission and decodes received sounddata packets to generate analog signals that are provided to the speakerto generate sound. Also, one or more of the processors 601, wirelesstransceiver 606 and CODEC 608 may include a digital signal processor(DSP) circuit (not shown separately).

Various embodiment may be implemented in the laptop computer 700illustrated in FIG. 7. A laptop computer 700 may include a processor 701coupled to a facial sensor, an image sensor, a sensor array, and othercomponents so that the processor 701 perform any of the processing inany of the methods 200 or 300. The processor 701 may be coupled tovolatile memory 702 and a large capacity nonvolatile memory, such as adisk drive 703 of Flash memory. The laptop computer 700 may also includea floppy disc drive 704 coupled to the processor 701. The laptopcomputer 700 may also include a number of connector ports 705 or othernetwork interfaces coupled to the processor 701 for establishing dataconnections, such as a Universal Serial Bus (USB) or FireWire® connectorsockets, or other network connection circuits for coupling the processor701 to a network (e.g., a communications network). In a notebookconfiguration, the laptop computer 700 may include a touchpad 706, thekeyboard 707, and a display 708 all coupled to the processor 701. Otherconfigurations of computing devices may include a computer mouse ortrackball coupled to the processor (e.g., via a USB input) as are wellknown, which may also be used in conjunction with various embodiments.

Some embodiments may be implemented on any of a variety of commerciallyavailable server devices deployed in a cloud network, such as the serverdevice 800 illustrated in FIG. 8. Such a server device 800 may include aprocessor 801 coupled to volatile memory 802 and a large capacitynonvolatile memory, such as a disk drive 803. The processor 801 maystore and/or transmit a pre-set of sequences to the user devices (e.g.,devices 100, 600, 700, etc.) so they can be used locally by thosedevices. The processor 801 may use data received from user devices toeliminate duplicate sequences, which could reduce the size of thedownloaded dataset. The server device 800 may also include a floppy discdrive, compact disc (CD) or DVD disc drive 804 coupled to the processor801. The server device 800 may also include network access ports 806coupled to the processor 801 for establishing data connections with anetwork connection circuit 805 and a communication network (e.g.,internet protocol (IP) network) coupled to other communication systemnetwork elements.

The processors 102, 601, 701, 801 may be any programmablemicroprocessor, microcomputer or multiple processor chip or chips thatmay be configured by software instructions (applications) to perform avariety of functions, including the functions of the various embodimentsdescribed in this application. In some mobile devices, multipleprocessors may be provided, such as one processor dedicated to wirelesscommunication functions and one processor dedicated to running otherapplications. Typically, software applications may be stored in theinternal memory before they are accessed and loaded into the processor.The processor 102, 601, 701, 801 may include internal memory sufficientto store the application software instructions.

Various embodiments illustrated and described are provided merely asexamples to illustrate various features of the claims. However, featuresshown and described with respect to any given embodiment are notnecessarily limited to the associated embodiment and may be used orcombined with other embodiments that are shown and described. Further,the claims are not intended to be limited by any one example embodiment.For example, one or more of the operations of the methods may besubstituted for or combined with one or more operations of the methods.

The foregoing method descriptions and the process flow diagrams areprovided merely as illustrative examples and are not intended to requireor imply that the operations of various embodiments may be performed inthe order presented. As will be appreciated by one of skill in the artthe order of operations in the foregoing embodiments may be performed inany order. Words such as “thereafter,” “then,” “next,” etc. are notintended to limit the order of the operations; these words are used toguide the reader through the description of the methods. Further, anyreference to claim elements in the singular, for example, using thearticles “a,” “an,” or “the” is not to be construed as limiting theelement to the singular.

Various illustrative logical blocks, functional components,functionality components, circuits, and algorithm operations describedin connection with the embodiments disclosed herein may be implementedas electronic hardware, computer software, or combinations of both. Toclearly illustrate this interchangeability of hardware and software,various illustrative components, blocks, functional components,circuits, and operations have been described above generally in terms oftheir functionality. Whether such functionality is implemented ashardware or software depends upon the particular application and designconstraints imposed on the overall system. Skilled artisans mayimplement the described functionality in varying ways for eachparticular application, but such embodiment decisions should not beinterpreted as causing a departure from the scope of the claims.

The hardware used to implement the various illustrative logics, logicalblocks, functional components, and circuits described in connection withthe embodiments disclosed herein may be implemented or performed with ageneral purpose processor, a digital signal processor (DSP), anapplication-specific integrated circuit (ASIC), a field programmablegate array (FPGA) or other programmable logic device, discrete gate ortransistor logic, discrete hardware components, or any combinationthereof designed to perform the functions described herein. Ageneral-purpose processor may be a microprocessor, but, in thealternative, the processor may be any conventional processor,controller, microcontroller, or state machine. A processor may also beimplemented as a combination of computing devices, e.g., a combinationof a DSP and a microprocessor, a plurality of microprocessors, one ormore microprocessors in conjunction with a DSP core, or any other suchconfiguration. Alternatively, some operations or methods may beperformed by circuitry that is specific to a given function.

In one or more embodiments, the functions described may be implementedin hardware, software, firmware, or any combination thereof. Ifimplemented in software, the functions may be stored as one or moreinstructions or code on a non-transitory computer-readable medium or anon-transitory processor-readable medium. The operations of a method oralgorithm disclosed herein may be embodied in a processor-executablesoftware module that may reside on a non-transitory computer-readable orprocessor-readable storage medium. Non-transitory computer-readable orprocessor-readable storage media may be any storage media that may beaccessed by a computer or a processor. By way of example but notlimitation, such non-transitory computer-readable or processor-readablemedia may include RAM, ROM, EEPROM, FLASH memory, CD-ROM or otheroptical disk storage, magnetic disk storage or other magnetic storagedevices, or any other medium that may be used to store desired programcode in the form of instructions or data structures and that may beaccessed by a computer. Disk and disc, as used herein, includes compactdisc (CD), laser disc, optical disc, digital versatile disc (DVD),floppy disk, and Blu-ray disc where disks usually reproduce datamagnetically, while discs reproduce data optically with lasers.Combinations of the above are also included within the scope ofnon-transitory computer-readable and processor-readable media.Additionally, the operations of a method or algorithm may reside as oneor any combination or set of codes and/or instructions on anon-transitory processor-readable medium and/or computer-readablemedium, which may be incorporated into a computer program product.

The preceding description of the disclosed embodiments is provided toenable any person skilled in the art to make or use the claims. Variousmodifications to these embodiments will be readily apparent to thoseskilled in the art, and the generic principles defined herein may beapplied to other embodiments and implementations without departing fromthe scope of the claims. Thus, the present disclosure is not intended tobe limited to the embodiments and implementations described herein, butis to be accorded the widest scope consistent with the following claimsand the principles and novel features disclosed herein.

What is claimed is:
 1. A method of controlling a device, comprising:determining, by a processor in a user equipment device, whether a useris engaged with an electronic device; capturing, by the processor in theuser equipment device, a lip sequence of the user for a period of timein response to determining that the user is engaged with the electronicdevice, wherein capturing the lip sequence of the user for the period oftime comprises: transmitting light detection and ranging (LIDAR) signalstowards a face of the user; capturing reflections of the LIDAR signalsoff points on the face of the user; and using the captured reflectionsto identify the points on lips of the user; generating, by the processorin the user equipment device, a reduction sequence based on the capturedlip sequence, wherein the generated reduction sequence is an informationstructure that includes a subset of the information included in thecaptured lip sequence; determining, by the processor in the userequipment device, an application with which the user is engaged;determining, by the processor in the user equipment device, a currentoperating state of the application; determining, by the processor in theuser equipment device, commands that are applicable to the currentoperating state; determining, by the processor in the user equipmentdevice, whether the generated reduction sequence information structurematches a comparison sequence associated with one of the determinedcommands; and controlling, by the processor in the user equipmentdevice, the device by sending a command associated with a matchingcomparison sequence to the application in response to determining thatthe generated reduction sequence information structure matches thecomparison sequence associated with one of the determined commands. 2.The method of claim 1, wherein capturing the lip sequence of the userfor the period of time comprises: determining a polygon based on theidentified points on the lips of the user.
 3. The method of claim 2,wherein: using the captured reflections to identify the points on thelips of the user comprises using the captured reflections to identifythe points and angles; and determining the polygon based on theidentified points comprises determining the polygon based on theidentified points and the identified angles.
 4. The method of claim 2,further comprising generating a captured sequence information structurebased on the determined polygon, wherein generating the reductionsequence based on the captured lip sequence comprises generating thereduction sequence based on the generated captured sequence informationstructure.
 5. The method of claim 1, wherein determining the applicationwith which the user is engaged comprises selecting the applicationoperating in the foreground of the electronic device.
 6. The method ofclaim 5, wherein selecting the application operating in the foregroundof the electronic device comprises selecting a video player applicationoperating in the foreground of the electronic device.
 7. The method ofclaim 1, wherein determining whether the user is engaged with theelectronic device comprises determining whether the user is lookingtowards an electronic display of the electronic device.
 8. The method ofclaim 1, wherein: the user equipment device is a stand-alone controlsystem device; the electronic device is a smart television; and thedevice is a set top box.
 9. The method of claim 1, wherein: the userequipment device is a set top box; and the electronic device and thedevice are combined into a single component, wherein the singlecomponent is a smart television device.
 10. The method of claim 1,wherein the user equipment device, the electronic device and the deviceare combined into a single component, wherein the single component is asmart television device.
 11. A user equipment device, comprising: alight detection and ranging (LIDAR) signal transmitter and receiver; anda processor configured with processor-executable instructions to:determine whether a user is engaged with an electronic device; capture alip sequence of the user for a period of time in response to determiningthat the user is engaged with the electronic device, by: transmittinglight detection and ranging (LIDAR) signals towards a face of the user;capturing reflections of the LIDAR signals off points on the face of theuser; and using the captured reflections to identify the points on lipsof the user; generate a reduction sequence based on the captured lipsequence, wherein the generated reduction sequence is an informationstructure that includes a subset of the information included in thecaptured lip sequence; determine an application with which the user isengaged; determine a current operating state of the application;determine commands that are applicable to the current operating state;determine whether the generated reduction sequence information structurematches a comparison sequence associated with one of the determinedcommands; and control a device by sending a command associated with amatching comparison sequence to the application in response todetermining that the generated reduction sequence information structurematches the comparison sequence associated with one of the determinedcommands.
 12. The user equipment device of claim 11, wherein theprocessor is configured to capture the lip sequence of the user for theperiod of time by: determining a polygon based on the identified pointson the lips of the user.
 13. The user equipment device of claim 12,wherein the processor is configured to: use the captured reflections toidentify the points on the lips of the user by using the capturedreflections to identify the points and angles; and determine the polygonbased on the identified points by determining the polygon based on theidentified points and the identified angles.
 14. The user equipmentdevice of claim 12, wherein: the processor is further configured togenerate a captured sequence information structure based on thedetermined polygon; and the processor is configured to generate thereduction sequence based on the captured lip sequence by generating thereduction sequence based on the generated captured sequence informationstructure.
 15. The user equipment device of claim 11, wherein theprocessor is configured to determine the application with which the useris engaged by selecting the application operating in the foreground ofthe electronic device.
 16. The user equipment device of claim 15,wherein the processor is configured to select the application operatingin the foreground of the electronic device by selecting a video playerapplication operating in the foreground of the electronic device. 17.The user equipment device of claim 11, wherein the processor isconfigured to determine whether the user is engaged with the electronicdevice by determining whether the user is looking towards an electronicdisplay of the electronic device.
 18. The user equipment device of claim11, wherein: the user equipment device is a stand-alone control systemdevice; the electronic device is a smart television; and the device is aset top box.
 19. The user equipment device of claim 11, wherein: theuser equipment device is a set top box; and the electronic device andthe device are integrated into a single component, wherein the singlecomponent is a smart television device.
 20. The user equipment device ofclaim 11, wherein the user equipment device, the electronic device andthe device are integrated into a single component, wherein the singlecomponent is a smart television device.
 21. A non-transitory computerreadable storage medium having stored thereon processor-executablesoftware instructions configured to cause a processor in a userequipment device to perform operations for controlling a device, theoperations comprising: determining whether a user is engaged with anelectronic device; capturing a lip sequence of the user for a period oftime in response to determining that the user is engaged with theelectronic device, wherein capturing the lip sequence of the user forthe period of time comprises: transmitting light detection and ranging(LIDAR) signals towards a face of the user; capturing reflections of theLIDAR signals off points on the face of the user; and using the capturedreflections to identify the points on lips of the user; generating areduction sequence based on the captured lip sequence, wherein thegenerated reduction sequence is an information structure that includes asubset of the information included in the captured lip sequence;determining an application with which the user is engaged; determining acurrent operating state of the application; determining commands thatare applicable to the current operating state; determining whether thegenerated reduction sequence information structure matches a comparisonsequence associated with one of the determined commands; and controllingthe device by sending a command associated with a matching comparisonsequence to the application in response to determining that thegenerated reduction sequence information structure matches thecomparison sequence associated with one of the determined commands. 22.The non-transitory computer readable storage medium of claim 21, whereinthe stored processor-executable software instructions are configured tocause the processor to perform operations such that capturing the lipsequence of the user for the period of time comprises: determining apolygon based on the identified points on the lips of the user.
 23. Thenon-transitory computer readable storage medium of claim 22, wherein thestored processor-executable software instructions are configured tocause the processor to perform operations such that: using the capturedreflections to identify the points on the lips of the user comprisesusing the captured reflections to identify the points and angles; anddetermining the polygon based on the identified points comprisesdetermining the polygon based on the identified points and theidentified angles.
 24. The non-transitory computer readable storagemedium of claim 22, wherein: the stored processor-executable softwareinstructions are configured to cause the processor to perform operationsfurther comprising generating a captured sequence information structurebased on the determined polygon; and the stored processor-executablesoftware instructions are configured to cause the processor to performoperations such that generating the reduction sequence based on thecaptured lip sequence comprises generating the reduction sequence basedon the generated captured sequence information structure.
 25. Thenon-transitory computer readable storage medium of claim 21, wherein thestored processor-executable software instructions are configured tocause the processor to perform operations such that determining theapplication with which the user is engaged comprises selecting theapplication operating in the foreground of the electronic device. 26.The non-transitory computer readable storage medium of claim 25, whereinthe stored processor-executable software instructions are configured tocause the processor to perform operations such that selecting theapplication operating in the foreground of the electronic devicecomprises selecting a video player application operating in theforeground of the electronic device.
 27. The non-transitory computerreadable storage medium of claim 21, wherein the storedprocessor-executable software instructions are configured to cause theprocessor to perform operations such that determining whether the useris engaged with the electronic device comprises determining whether theuser is looking towards an electronic display of the electronic device.28. The non-transitory computer readable storage medium of claim 21,wherein: the user equipment device is a stand-alone control systemdevice; the electronic device is a smart television; and the device is aset top box.
 29. The non-transitory computer readable storage medium ofclaim 21, wherein: the user equipment device is a set top box; and theelectronic device and the device are integrated into a single component,wherein the single component is a smart television device.
 30. Thenon-transitory computer readable storage medium of claim 21, wherein theuser equipment device, the electronic device and the device areintegrated into a single component, wherein the single component is asmart television device.